Distributed Power-law Graph Computing: Theoretical and Empirical Analysis
نویسندگان
چکیده
With the emergence of big graphs in a variety of real applications like social networks, machine learning based on distributed graph-computing (DGC) frameworks has attracted much attention from big data machine learning community. In DGC frameworks, the graph partitioning (GP) strategy plays a key role to affect the performance, including the workload balance and communication cost. Typically, the degree distributions of natural graphs from real applications follow skewed power laws, which makes GP a challenging task. Recently, many methods have been proposed to solve the GP problem. However, the existing GP methods cannot achieve satisfactory performance for applications with power-law graphs. In this paper, we propose a novel vertex-cut method, called degree-based hashing (DBH), for GP. DBH makes effective use of the skewed degree distributions for GP. We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance. Furthermore, empirical results on several large power-law graphs also show that DBH can outperform the state of the art.
منابع مشابه
Distributed Power-law Graph Computing Distributed Power-law Graph Computing: Theoretical and Empirical Analysis∗
Typically, a large-scale natural graph follows a skewed power law. In distributed graphstructured computations, the skewness usually makes a bad partitioning, which leads to high communication cost and workload imbalance. Therefore, graph partitioning (GP) is a challenging issue. To tackle this challenge, we introduce degree-based techniques into GP via vertex-cut. Accordingly, we develop a nov...
متن کاملMatrix Completion from Power-Law Distributed Samples
The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper,...
متن کاملUse of Structure Codes (Counts) for Computing Topological Indices of Carbon Nanotubes: Sadhana (Sd) Index of Phenylenes and its Hexagonal Squeezes
Structural codes vis-a-vis structural counts, like polynomials of a molecular graph, are important in computing graph-theoretical descriptors which are commonly known as topological indices. These indices are most important for characterizing carbon nanotubes (CNTs). In this paper we have computed Sadhana index (Sd) for phenylenes and their hexagonal squeezes using structural codes (counts). Sa...
متن کاملWhy Do Cascade Sizes Follow a Power-Law?
We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is...
متن کاملA new Shuffled Genetic-based Task Scheduling Algorithm in Heterogeneous Distributed Systems
Distributed systems such as Grid- and Cloud Computing provision web services to their users in all of the world. One of the most important concerns which service providers encounter is to handle total cost of ownership (TCO). The large part of TCO is related to power consumption due to inefficient resource management. Task scheduling module as a key component can has drastic impact on both user...
متن کامل